MODULE 2:   Finite Automata
Deterministic Finite Automata

In this chapter we go over the formal definitions and proofs related to the deterministic finite automata (DFA) computational model. Our main goal is to use the DFA model as a warm-up towards the more general Turing machine model. That being said, the DFA model is interesting in its own right. Below you can find some of the applications of deterministic finite automata:

1
Basic Definitions
Computational Model

Anything that processes information can be called a computer. However, there can be restrictions on how the information can be processed (either universal restrictions imposed by, say, the laws of physics, or restrictions imposed by the particular setting we are interested in).

A computational model is a set of allowed rules for information processing. Given a computational model, we can talk about the computers (or machines) allowed by the computational model. A computer is a specific instantiation of the computational model, or in other words, it is a specific collection of information processing rules allowed by the computational model.

Note that even though the terms “computer” and “machine” suggest a physical device, in this course we are not interested in physical representations that realize computers, but rather in the mathematical representations. The term algorithm is synonymous to computer (and machine) and makes it more clear that we are not necessarily talking about a physical device.

In this section we give the basic definitions regarding the computational model known as deterministic finite automata.

Definition (Deterministic Finite Automaton (DFA))

A deterministic finite automaton (DFA) MM is a 55-tuple M=(Q,Σ,δ,q0,F),M = (Q, \Sigma, \delta, q_0, F), where

  • QQ is a non-empty finite set (which we refer to as the set of states of the DFA);

  • Σ\Sigma is a non-empty finite set (which we refer to as the alphabet of the DFA);

  • δ\delta is a function of the form δ:Q×ΣQ\delta: Q \times \Sigma \to Q (which we refer to as the transition function of the DFA);

  • q0Qq_0 \in Q is an element of QQ (which we refer to as the start state of the DFA);

  • FQF \subseteq Q is a subset of QQ (which we refer to as the set of accepting states of the DFA).

Note (State diagram of a DFA)

It is very common to represent a DFA with a state diagram. Below is an example of how we draw a state diagram of a DFA:

image

In this example, Σ={0,1}\Sigma = \{{\texttt{0}},{\texttt{1}}\}, Q={q0,q1,q2,q3}Q = \{q_0,q_1,q_2,q_3\}, F={q1,q2}F = \{q_1,q_2\}. The labeled arrows between the states encode the transition function δ\delta, which can also be represented with a table as shown below (row qiQq_i \in Q and column bΣb \in \Sigma contains δ(qi,b)\delta(q_i, b)).

image

The start state is labeled with q0q_0, but if another label is used, we can identify the start state in the state diagram by identifying the state with an arrow that does not originate from another state and goes into that state.

Remark (The labels of DFA components)

In the definition above, we used the labels Q,Σ,δ,q0,FQ, \Sigma, \delta, q_0, F. One can of course use other labels when defining a DFA as long as it is clear what each label represents.

Remark (Equivalence of DFAs)

We’ll consider two DFAs to be equivalent/same if they are the same machine up to renaming the elements of the sets QQ and Σ\Sigma. For instance, the two DFAs below are considered to be the same even though they have different labels on the states and use different alphabets.

image

The flexibility with the choice of labels allows us to be more descriptive when we define a DFA. For instance, we can give labels to the states that communicate the “purpose” or “role” of each state.

Definition (Computation path for a DFA)

Let M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) be a DFA and let w=w1w2wnw = w_1w_2 \cdots w_n be a string over an alphabet Σ\Sigma (so wiΣw_i \in \Sigma for each i{1,2,,n}i \in \{1,2,\ldots,n\}). The computation path of MM with respect to ww is a specific sequence of states r0,r1,r2,,rnr_0, r_1, r_2, \ldots , r_n (where each riQr_i \in Q). We’ll often write this sequence as follows.

w1w_1 w2w_2 \ldots wnw_n
r0r_0 r1r_1 r2r_2 \ldots rnr_n

 
These states correspond to the states reached when the DFA reads the input ww one character at a time. This means r0=q0r_0 = q_0, and i{1,2,,n},δ(ri1,wi)=ri.\forall i \in \{1,2,\ldots,n\}, \quad \delta(r_{i-1},w_i) = r_i. An accepting computation path is such that rnFr_n \in F, and a rejecting computation path is such that rn∉Fr_n \not\in F.

We say that MM accepts a string wΣw \in \Sigma^* (or “M(w)M(w) accepts”, or “M(w)=1M(w) = 1”) if the computation path of MM with respect to ww is an accepting computation path. Otherwise, we say that MM rejects the string ww (or “M(w)M(w) rejects”, or “M(w)=0M(w) = 0”).

Example (An example of a computation path)

Let M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) be the DFA in Note (State diagram of a DFA). For ease of reference, we present the state diagram once again here.

image

For input string w=110110w = {\texttt{110110}}, the computation path of MM with respect to ww is

1{\texttt{1}} 1{\texttt{1}} 0{\texttt{0}} 1{\texttt{1}} 1{\texttt{1}} 0{\texttt{0}}
q0q_0 q1q_1 q2q_2 q3q_3 q2q_2 q2q_2 q3q_3

 
Since q3q_3 is not in FF, this is a rejecting computation path, and therefore MM rejects ww, i.e. M(w)=0M(w) = 0.

Note (Extended transition function)

Let M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) be a DFA. The transition function δ:Q×ΣQ\delta : Q \times \Sigma \to Q extends to δ:Q×ΣQ\delta^* : Q \times \Sigma^* \to Q, where δ(q,w)\delta^*(q, w) is defined as the state we end up in if we start at qq and read the string w=w1wnw = w_1 \ldots w_n. Or in other words, δ(q,w)=δ(δ(δ(δ(q,w1),w2),w3),wn).\delta^*(q, w) = \delta(\ldots\delta(\delta(\delta(q,w_1),w_2),w_3)\ldots, w_n). The star in the notation can be dropped and δ\delta can be overloaded to represent both a function δ:Q×ΣQ\delta : Q \times \Sigma \to Q and a function δ:Q×ΣQ\delta : Q \times \Sigma^* \to Q. Using this notation, a word ww is accepted by the DFA MM if δ(q0,w)F\delta(q_0, w) \in F.

Definition (DFA solving a decision problem or a language)

Let f:Σ{0,1}f: \Sigma^* \to \{0,1\} be a decision problem and let MM be a DFA over the same alphabet. We say that MM solves (or decides, or computes) ff if the input/output behavior of MM matches ff exactly, in the following sense: for all wΣw \in \Sigma^*, M(w)=f(w)M(w) = f(w).

If LL is the language corresponding to ff, the above definition is equivalent to saying that MM solves (or decides, or computes) LL if the following holds:

  • if wLw \in L, then MM accepts ww (i.e. M(w)=1M(w) = 1);

  • if w∉Lw \not \in L, then MM rejects ww (i.e. M(w)=0M(w) = 0).

Example (Even number of 1{\texttt{1}}’s)

The following DFA solves the language consisting of all binary strings that contain an even number of 1{\texttt{1}}’s.

image

Example (Ends with 00{\texttt{00}})

The following DFA solves the language consisting of all binary strings that end with 00{\texttt{00}}.

image

Note (Uniqueness of the language of a DFA)

For a DFA MM, there is a unique language LL that MM solves. This language is often denoted by L(M)L(M) and is referred to as the language of MM. Note that L(M)={wΣ:M accepts w}.L(M) = \{w \in \Sigma^* : \text{$M$ accepts $w$}\}.

Exercise (Describe the language of a DFA)

For each DFA MM below, define L(M)L(M).

  1. image

  2. image

Solution
  1. L(M)={x{0,1}:x ends with a 0}{ϵ}L(M) = \{x \in \{{\texttt{0}},{\texttt{1}}\}^* : \text{$x$ ends with a {\texttt{0}}}\} \cup \{\epsilon\}.

  2. L(M)={a,b,cb,cc}L(M) = \{{\texttt{a}}, {\texttt{b}}, {\texttt{cb}}, {\texttt{cc}}\}.

Exercise (Draw DFAs)

For each language below (over the alphabet Σ={0,1}\Sigma = \{{\texttt{0}},{\texttt{1}}\}), draw a DFA solving it.

  1. {101,110}\{{\texttt{101}}, {\texttt{110}}\}

  2. {0,1}{101,110}\{{\texttt{0}},{\texttt{1}}\}^* \setminus \{{\texttt{101}}, {\texttt{110}}\}

  3. {x{0,1}:x starts and ends with the same bit}\{x \in \{{\texttt{0}},{\texttt{1}}\}^* : \text{$x$ starts and ends with the same bit}\}

  4. {110}={ϵ,110,110110,110110110,}\{{\texttt{110}}\}^* = \{\epsilon, {\texttt{110}}, {\texttt{110110}}, {\texttt{110110110}}, \ldots\}

  5. {x{0,1}:x contains 110 as a substring}\{x \in \{{\texttt{0}},{\texttt{1}}\}^*: \text{$x$ contains ${\texttt{110}}$ as a substring}\}

Solution
  1. Below, all missing transitions go to a rejecting sink state (so the DFA actually has 6 states in total).

    image

  2. Take the DFA above and flip the accepting and rejecting states.

  3.  

    image

  4. Below, all missing transitions go to a rejecting sink state.

    image

  5.  

    image

Exercise (Finite languages can be solved by DFAs)

Let LL be a finite language, i.e., it contains a finite number of words. At a high level, describe why there is a DFA solving LL.

Solution

It is a good idea to first think about whether languages of size 11 are regular. Are languages of size 22 regular? When you draw DFAs for such languages, the basic idea is to “hard-code” the words in the language into the state-diagram of the DFA (this is what we did for part 1 of Exercise (Draw DFAs)). So for each word in the language, there would be a path in the DFA corresponding to that word that ends in an accepting state. This idea works whenever the language in consideration has a finite number of words. The number of states in the DFA will depend on the number of words in the language and the length of those words, but since the language is finite (and each word has finite-length), all the words in the language can be hard-coded using a finite number of states.

Definition (Regular language)

A language LΣL \subseteq \Sigma^* is called a regular language if there is a deterministic finite automaton MM that solves LL.

Example (Some examples of regular languages)

All the languages in Exercise (Draw DFAs) are regular languages.

Exercise (Equal number of 01{\texttt{01}}’s and 10{\texttt{10}}’s)

Let Σ={0,1}\Sigma = \{{\texttt{0}},{\texttt{1}}\}. Is the following language regular? L={w{0,1}:w has an equal number of occurrences of 01 and 10 as substrings}L = \{w \in \{{\texttt{0}},{\texttt{1}}\}^* : w \text{ has an equal number of occurrences of ${\texttt{01}}$ and ${\texttt{10}}$ as substrings}\}

Hint

The language is regular. Go through some examples to see if you can notice a pattern and come up with an alternate description for the language.

Solution

The answer is yes because the language is pretty much the same as the language in Exercise (Draw DFAs), part 3 (except that the starting state should also be accepting).

Definition (Complexity class REG\mathsf{REG})

We denote by REG\mathsf{REG} the set of all regular languages (over the default alphabet Σ={0,1}\Sigma = \{{\texttt{0}}, {\texttt{1}}\}).

2
Non-Regular Languages
Theorem (0n1n{\texttt{0}}^n {\texttt{1}}^n is not regular)

Let Σ={0,1}\Sigma = \{{\texttt{0}},{\texttt{1}}\}. The language L={0n1n:nN}L = \{{\texttt{0}}^n {\texttt{1}}^n: n \in \mathbb N\} is not regular.

Proof

Our goal is to show that L={0n1n:nN}L = \{{\texttt{0}}^n {\texttt{1}}^n: n \in \mathbb N\} is not regular. The proof is by contradiction. So let’s assume that LL is regular.

Since LL is regular, by definition, there is some deterministic finite automaton MM that solves LL. Let kk denote the number of states of MM. Consider the following set of k+1k+1 strings: P={0n:n{0,1,,k}}P = \{{\texttt{0}}^n : n \in \{0,1,\ldots,k\}\}. Each string in PP, when fed as input to MM, ends up in one of the kk states of MM. By the pigeonhole principle (thinking of the strings as pigeons and the states as holes), we know that there must be two strings in PP that end up in the same state. In other words, there are i,j{0,1,,k}i, j \in \{0,1,\ldots,k\}, with iji \neq j, such that the string 0i{\texttt{0}}^i and the string 0j{\texttt{0}}^j end up in the same state. This implies that for any string w{0,1}w \in \{{\texttt{0}},{\texttt{1}}\}^*, 0iw{\texttt{0}}^iw and 0jw{\texttt{0}}^jw end up in the same state. We’ll now reach a contradiction, and conclude the proof, by considering a particular ww such that 0iw{\texttt{0}}^iw and 0jw{\texttt{0}}^jw end up in different states.

Consider the string w=1iw = {\texttt{1}}^i. Then since MM solves LL, we know 0iw=0i1i{\texttt{0}}^iw = {\texttt{0}}^i{\texttt{1}}^i must end up in an accepting state. On the other hand, since iji \neq j, 0jw=0j1i{\texttt{0}}^jw = {\texttt{0}}^j{\texttt{1}}^i is not in the language, and therefore cannot end up in an accepting state. This is the desired contradiction.

Exercise (Would any set of pigeons work?)

In the proof of the above theorem, we defined the set P={0n:n{0,1,,k}}P = \{{\texttt{0}}^n : n \in \{0,1,\ldots,k\}\} and then applied the pigeonhole principle. Explain why picking the following sets would not have worked in that proof.

  1. P={1n:n{1,2,,k+1}}P = \{{\texttt{1}}^n : n \in \{1,2,\ldots,k+1\}\}

  2. P={1,11,000,0000,00000,,0k+1}P = \{{\texttt{1}}, {\texttt{11}}, {\texttt{000}}, {\texttt{0000}}, {\texttt{00000}}, \ldots, {\texttt{0}}^{k+1}\}

Solution
  1. With P={1n:n{1,2,,k+1}}P = \{{\texttt{1}}^n : n \in \{1,2,\ldots,k+1\}\} we can still apply the pigeonhole principle to conclude that there are two strings 1i{\texttt{1}}^i and 1j{\texttt{1}}^j, iji \neq j, that must end up in the same state. However, to reach a contradiction, we need to find a string w{0,1}w \in \{{\texttt{0}}, {\texttt{1}}\}^* such that exactly one of 1iw{\texttt{1}}^iw and 1jw{\texttt{1}}^jw is in the language, and the other is not. On the other hand, any string starting with a 1{\texttt{1}} is not in the language.

  2. The argument is the same as above, with the key observation being the following. When we apply the pigeonhole principle to conclude that two strings in PP must end up in the same state, we have no control over which two strings in PP end up in the same state. Therefore, our argument must work no matter which two strings in PP end up in the same state. For the PP given to us, the two strings that end up in the same state could be 1{\texttt{1}} and 11{\texttt{11}}, and so we would run into the same problem as in the previous part.

Exercise (c251anb2n{\texttt{c}}^{251}{\texttt{a}}^n{\texttt{b}}^{2n} is not regular)

Let Σ={a,b,c}\Sigma = \{{\texttt{a}},{\texttt{b}},{\texttt{c}}\}. Prove that L={c251anb2n:nN}L = \{{\texttt{c}}^{251} {\texttt{a}}^n {\texttt{b}}^{2n}: n \in \mathbb N\} is not regular.

Solution

Our goal is to show that L={c251anb2n:nN}L = \{{\texttt{c}}^{251} {\texttt{a}}^n {\texttt{b}}^{2n}: n \in \mathbb N\} is not regular. The proof is by contradiction. So let’s assume that LL is regular.

Since LL is regular, by definition, there is some deterministic finite automaton MM that solves LL. Let kk denote the number of states of MM. For nNn \in \mathbb N, let rnr_n denote the state that MM reaches after reading c251an{\texttt{c}}^{251}{\texttt{a}}^n. By the pigeonhole principle, we know that there must be a repeat among r0,r1,,rkr_0, r_1,\ldots, r_k. In other words, there are i,j{0,1,,k}i, j \in \{0,1,\ldots,k\} with iji \neq j such that ri=rjr_i = r_j. This means that the string c251ai{\texttt{c}}^{251}{\texttt{a}}^i and the string c251aj{\texttt{c}}^{251}{\texttt{a}}^j end up in the same state in M.M. Therefore, c251aiw{\texttt{c}}^{251}{\texttt{a}}^iw and c251ajw{\texttt{c}}^{251}{\texttt{a}}^jw, for any string w{a,b,c}w \in \{{\texttt{a}},{\texttt{b}},{\texttt{c}}\}^*, end up in the same state in MM. We’ll now reach a contradiction, and conclude the proof, by considering a particular ww such that c251aiw{\texttt{c}}^{251}{\texttt{a}}^iw and c251ajw{\texttt{c}}^{251}{\texttt{a}}^jw end up in different states.

Consider the string w=b2iw = {\texttt{b}}^{2i}. Then since MM solves LL, we know c251aiw=c251aib2i{\texttt{c}}^{251}{\texttt{a}}^iw = {\texttt{c}}^{251}{\texttt{a}}^i{\texttt{b}}^{2i} must end up in an accepting state. On the other hand, since iji \neq j, c251ajw=c251ajb2i{\texttt{c}}^{251}{\texttt{a}}^jw = {\texttt{c}}^{251}{\texttt{a}}^j{\texttt{b}}^{2i} is not in the language, and therefore cannot end up in an accepting state. This is the desired contradiction.

Exercise (A fooling pigeon set)

In Exercise (Would any set of pigeons work?) we saw that the “pigeon set” that we use to apply the pigeonhole principle must be chosen carefully. We’ll call a pigeon set a fooling pigeon set if it is a pigeon set that “works”. That is, given a DFA with kk states that is assumed to solve a non-regular LL, a fooling pigeon set of size k+1k+1 allows us to carry out the contradiction proof, and conclude that LL is non-regular. Identify the property that a fooling pigeon set should have.

Solution

A fooling pigeon set PP is such that for all x,yPx, y \in P, there exists a zΣz \in \Sigma^* such that exactly one of xzxz and yzyz is in LL, and the other is not. (Note that the choice of zz for different pairs xx and yy may be different.)

The crux of a non-regularity proof is to show that for any kNk \in \mathbb N (which denotes the number of states of a DFA that is assumed to solve LL), there is a fooling pigeon set of size at least k+1k+1. Since kk is arbitrary, showing that a language LL is non-regular really amounts to finding an infinite-size fooling pigeon set. In the case of the language L={0n1n:nN}L = \{{\texttt{0}}^n {\texttt{1}}^n: n \in \mathbb N\}, the infinite-size fooling pigeon set is {0n:nN}\{{\texttt{0}}^n : n \in \mathbb N\}.

Theorem (A unary non-regular language)

Let Σ={a}\Sigma = \{{\texttt{a}}\}. The language L={a2n:nN}L = \{{\texttt{a}}^{2^n}: n \in \mathbb N\} is not regular.

Proof

Our goal is to show that L={a2n:nN}L = \{{\texttt{a}}^{2^n}: n \in \mathbb N\} is not regular. The proof is by contradiction. So let’s assume that LL is regular.

Since LL is regular, by definition, there is some deterministic finite automaton MM that solves LL. Let kk denote the number of states of MM. For nNn \in \mathbb N, let rnr_n denote the state that MM reaches after reading a2n{\texttt{a}}^{2^n} (i.e. rn=δ(q0,a2n)r_n = \delta(q_0, {\texttt{a}}^{2^n})). By the pigeonhole principle, we know that there must be a repeat among r0,r1,,rkr_0, r_1,\ldots, r_k (a sequence of k+1k+1 states). In other words, there are indices i,j{0,1,,k}i, j \in \{0,1,\ldots,k\} with i<ji < j such that ri=rjr_i = r_j. This means that the string a2i{\texttt{a}}^{2^i} and the string a2j{\texttt{a}}^{2^j} end up in the same state in MM. Therefore, a2iw{\texttt{a}}^{2^i}w and a2jw{\texttt{a}}^{2^j}w, for any string w{a}w \in \{{\texttt{a}}\}^*, end up in the same state in MM. We’ll now reach a contradiction, and conclude the proof, by considering a particular ww such that a2iw{\texttt{a}}^{2^i}w ends up in an accepting state but a2jw{\texttt{a}}^{2^j}w ends up in a rejecting state (i.e. they end up in different states).

Consider the string w=a2iw = {\texttt{a}}^{2^i}. Then a2iw=a2ia2i=a2i+1{\texttt{a}}^{2^i}w = {\texttt{a}}^{2^i}{\texttt{a}}^{2^i} = {\texttt{a}}^{2^{i+1}}, and therefore must end up in an accepting state. On the other hand, a2jw=a2ja2i=a2j+2i{\texttt{a}}^{2^j}w = {\texttt{a}}^{2^j}{\texttt{a}}^{2^i} = {\texttt{a}}^{2^j + 2^i}. We claim that this word must end up in a rejecting state because 2j+2i2^j + 2^i cannot be written as a power of 22 (i.e., cannot be written as 2t2^t for some tNt \in \mathbb N). To see this, note that since i<ji < j, we have 2j<2j+2i<2j+2j=2j+1,2^j < 2^j + 2^i < 2^j + 2^j = 2^{j+1}, which implies that if 2j+2i=2t2^j + 2^i = 2^t, then j<t<j+1j < t < j+1 (which is impossible). So 2j+2i2^j + 2^i cannot be written as 2t2^t for tNt \in \mathbb N, and therefore a2j+2i{\texttt{a}}^{2^j + 2^i} leads to a reject state in MM as claimed.

Note (Lower-bounding the number of states)

In the next exercise, we’ll write the proof in a slightly different way to offer a slightly different perspective. In particular, we’ll phrase the proof such that the goal is to show no DFA solving LL can have a finite number of states. This is done by identifying an infinite set of strings that all must end up in a different state (this is the fooling pigeon set that we defined in Exercise (A fooling pigeon set)). Once you have a set of strings SS such that every string in SS must end up in a different state, we can conclude any DFA solving the language must have at least S|S| states.

This slight change in phrasing of the non-regularity proof makes it clear that even if LL is regular, the technique can be used to prove a lower bound on the number of states of any DFA solving LL.

Exercise (anbncn{\texttt{a}}^n{\texttt{b}}^n{\texttt{c}}^n is not regular)

Let Σ={a,b,c}\Sigma = \{{\texttt{a}},{\texttt{b}},{\texttt{c}}\}. Prove that L={anbncn:nN}L = \{{\texttt{a}}^n{\texttt{b}}^n{\texttt{c}}^n: n \in \mathbb N\} is not regular.

Solution

Our goal is to show that L={anbncn:nN}L = \{{\texttt{a}}^n{\texttt{b}}^n{\texttt{c}}^n: n \in \mathbb N\} is not regular. Consider the set of strings P={an:nN}P = \{{\texttt{a}}^n : n \in \mathbb N\}. We claim that in any DFA solving LL, no two strings in PP can end up in the same state. To prove this claim, let’s go by contradiction, and assume that there are two strings in PP, ai{\texttt{a}}^i and aj{\texttt{a}}^j, iji \neq j, that end up in the same state. Then for any wΣw \in \Sigma^*, the strings aiw{\texttt{a}}^iw and ajw{\texttt{a}}^jw must end up in the same state. But for w=biciw = {\texttt{b}}^i{\texttt{c}}^i, aiw=aibici{\texttt{a}}^iw = {\texttt{a}}^i{\texttt{b}}^i{\texttt{c}}^i must end up in an accepting state, whereas ajw=ajbici{\texttt{a}}^jw = {\texttt{a}}^j{\texttt{b}}^i{\texttt{c}}^i must end up in a rejecting state. This is a contradiction.

Since we have identified an infinite set of strings that all must end up in a different state, we conclude that there cannot be a DFA solving LL, since by definition, DFAs have a finite number of states.

3
Closure Properties of Regular Languages

In this section we will be interested in the following question. Given regular languages, what operations can we apply to them (e.g. union, intersection, concatenation, etc.) so that the resulting language is also regular?

Exercise (Are regular languages closed under complementation?)

Is it true that if LL is regular, then its complement ΣL\Sigma^* \setminus L is also regular? In other words, are regular languages closed under the complementation operation?

Hint

The answer is yes. How can you modify the DFA for LL to construct a DFA for ΣL\Sigma^* \setminus L?

Solution

Yes. If LL is regular, then there is a DFA M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) solving LL. The complement of LL is solved by the DFA M=(Q,Σ,δ,q0,QF)M = (Q, \Sigma, \delta, q_0, Q\setminus F). Take a moment to observe that this exercise allows us to say that a language is regular if and only if its complement is regular. Equivalently, a language is not regular if and only if its complement is not regular.

Exercise (Are regular languages closed under subsets?)

Is it true that if LΣL \subseteq \Sigma^* is a regular language, then any LLL' \subseteq L is also a regular language?

Hint

The answer is no. Try to think of a counter example.

Solution

No. For example, L=ΣL = \Sigma^* is a regular language (construct a single state DFA in which the state is accepting). On the other hand, by Theorem (0n1n{\texttt{0}}^n {\texttt{1}}^n is not regular), {0n1n:nN}{0,1}\{{\texttt{0}}^n{\texttt{1}}^n : n \in \mathbb N\} \subseteq \{{\texttt{0}},{\texttt{1}}\}^* is not regular.

Theorem (Regular languages are closed under union)

Let Σ\Sigma be some finite alphabet. If L1ΣL_1 \subseteq \Sigma^* and L2ΣL_2 \subseteq \Sigma^* are regular languages, then the language L1L2L_1 \cup L_2 is also regular.

Proof

Given regular languages L1L_1 and L2L_2, we want to show that L1L2L_1 \cup L_2 is regular. Since L1L_1 and L2L_2 are regular languages, by definition, there are DFAs M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) and M=(Q,Σ,δ,q0,F)M' = (Q', \Sigma, \delta', q'_0, F') that solve L1L_1 and L2L_2 respectively (i.e. L(M)=L1L(M) = L_1 and L(M)=L2L(M') = L_2). To show L1L2L_1 \cup L_2 is regular, we’ll construct a DFA M=(Q,Σ,δ,q0,F)M'' = (Q'', \Sigma, \delta'', q''_0, F'') that solves L1L2L_1 \cup L_2. The definition of MM'' will make use of MM and MM'.

The idea behind the construction of MM'' is as follows. To figure out if a string ww is in L1L2L_1 \cup L_2, we can run M(w)M(w) and M(w)M'(w) to see if at least one of them accepts. However, that would require scanning ww twice, which is something a DFA cannot do. Therefore, the main trick is to simulate both M(w)M(w) and M(w)M'(w) simultaneously. For this, we can imagine having one thread for M(w)M(w) and another thread for M(w)M'(w) that we run together. We can write the computation paths corresponding to the threads as follows.

w1w_1 w2w_2 w3w_3 \ldots wnw_n
thread 1: r0r_0 r1r_1 r2r_2 r3r_3 \ldots rnr_n
thread 2: s0s_0 s1s_1 s2s_2 s3s_3 \ldots sns_n

 
Here, thread 1 represents a computation path for MM and thread 2 represents a computation path for MM'. We want the above to represent the computation path of a single DFA MM'' (and we want to know if one of the threads is an accepting computation path). We can accomplish this by viewing the combination of states (ri,si)(r_i, s_i) as a single state of MM''. And when we read a symbol wi+1w_{i+1}, we update rir_i according to the transition function of MM, and we update sis_i according to the transition function of MM'. In more detail:

  • The set of states is Q=Q×Q={(q,q):qQ,qQ}Q'' = Q \times Q' = \{(q, q') : q \in Q, q' \in Q'\}. (Note that in the definition of a DFA, we can use any fixed finite set as the set of states. It does not matter if a state is represented as a tuple or some other object; you could rename them to be of the form qiq_i'' if you wanted.)

  • The transition function δ\delta'' is defined such that for all (q,q)Q(q,q') \in Q'' and for all σΣ\sigma \in \Sigma, δ((q,q),σ)=(δ(q,σ),δ(q,σ)).\delta''((q,q'), \sigma) = (\delta(q,\sigma), \delta'(q',\sigma)). (Note that for wΣw \in \Sigma^*, δ((q,q),w)=(δ(q,w),δ(q,w))\delta''((q,q'), w) = (\delta(q,w), \delta'(q', w)).)

  • The initial state is q0=(q0,q0)q''_0 = (q_0, q'_0).

  • The set of accepting states is F={(q,q):qF or qF}F'' = \{(q, q') : q \in F \text{ or } q' \in F'\}.

This completes the definition of MM''. It remains to show that MM'' indeed solves the language L1L2L_1 \cup L_2, i.e. L(M)=L1L2L(M'') = L_1 \cup L_2. We will first argue that L1L2L(M)L_1 \cup L_2 \subseteq L(M'') and then argue that L(M)L1L2L(M'') \subseteq L_1 \cup L_2. Both inclusions will follow easily from the definition of MM'' and the definition of a DFA accepting a string.

L1L2L(M)L_1 \cup L_2 \subseteq L(M''): Suppose wL1L2w \in L_1 \cup L_2, which means ww either belongs to L1L_1 or it belongs to L2L_2. Our goal is to show that wL(M)w \in L(M''). Without loss of generality, assume ww belongs to L1L_1, or in other words, MM accepts ww (the argument is essentially identical when ww belongs to L2L_2). So we know that δ(q0,w)F\delta(q_0, w) \in F. By the definition of δ\delta'', δ((q0,q0),w)=(δ(q0,w),δ(q0,w))\delta''((q_0, q'_0), w) = (\delta(q_0,w), \delta'(q'_0, w)). And since δ(q0,w)F\delta(q_0, w) \in F, (δ(q0,w),δ(q0,w))F(\delta(q_0,w), \delta'(q'_0, w)) \in F'' (by the definition of FF''). So ww is accepted by MM'' as desired.

L(M)L1L2L(M'') \subseteq L_1 \cup L_2: Suppose that wL(M)w \in L(M''). Our goal is to show that wL1w \in L_1 or wL2w \in L_2. Since ww is accepted by MM'', we know that δ((q0,q0),w)=(δ(q0,w),δ(q0,w))F\delta''((q_0, q'_0), w) = (\delta(q_0,w), \delta'(q'_0, w)) \in F''. By the definition of FF'', this means that either δ(q0,w)F\delta(q_0, w) \in F or δ(q0,w)F\delta'(q'_0, w) \in F', i.e., ww is accepted by MM or MM'. This implies that either wL(M)=L1w \in L(M) = L_1 or wL(M)=L2w \in L(M') = L_2, as desired.

Corollary (Regular languages are closed under intersection)

Let Σ\Sigma be some finite alphabet. If L1ΣL_1 \subseteq \Sigma^* and L2ΣL_2 \subseteq \Sigma^* are regular languages, then the language L1L2L_1 \cap L_2 is also regular.

Proof

We want to show that regular languages are closed under the intersection operation. We know that regular languages are closed under union and closed under complementation. The result then follows since AB=ABA \cap B = \overline{\overline{A} \cup \overline{B}}.

Exercise (Direct proof that regular languages are closed under difference)

Give a direct proof (without using the fact that regular languages are closed under complementation, union and intersection) that if L1L_1 and L2L_2 are regular languages, then L1L2L_1 \setminus L_2 is also regular.

Solution

The proof is very similar to the proof of Theorem (Regular languages are closed under union). The only difference is the definition of FF'', which now needs to be defined as F={(q,q):qF and qQF}.F'' = \{(q, q') : q \in F \text{ and } q' \in Q' \setminus F'\}. The argument that L(M)=L(M)L(M)L(M'') = L(M) \setminus L(M') needs to be slightly adjusted in order to agree with FF''.

Exercise (Finite vs infinite union)
  1. Suppose L1,,LkL_1, \ldots, L_k are all regular languages. Is it true that their union i=0kLi\bigcup_{i=0}^k L_i must be a regular language?

  2. Suppose L0,L1,L2,L_0, L_1, L_2, \ldots is an infinite sequence of regular languages. Is it true that their union i0Li\bigcup_{i\geq 0} L_i must be a regular language?

Hint

The first one is “yes” (think induction) and the second one is “no” (give a counter-example).

Solution

In part 1, we are asking whether a finite union of regular languages is regular. The answer is yes, and this can be proved using induction, where the base case corresponds to a single regular language, and the induction step corresponds to Theorem (Regular languages are closed under union). In part 2, we are asking whether a countably infinite union of regular languages is regular. The answer is no. First note that any language of cardinality 1 is regular, i.e., {w}\{w\} for any wΣw \in \Sigma^* is a regular language. In particular, for any nNn \in \mathbb N, the language Ln={0n1n}L_n = \{{\texttt{0}}^n{\texttt{1}}^n\} of cardinality 1 is regular. But n0Ln={0n1n:nN}\bigcup_{n\geq 0} L_n = \{{\texttt{0}}^n{\texttt{1}}^n : n \in \mathbb N\} is not regular.

Exercise (Union of non-regular languages)

Suppose L1L_1 and L2L_2 are not regular languages. Is it always true that L1L2L_1 \cup L_2 is not a regular language?

Hint

No. Come up with a counter-example. You can try to find one such that L1L2L_1 \cup L_2 is Σ\Sigma^*.

Solution

The answer is no. Consider L={0n1n:nN}L = \{{\texttt{0}}^n{\texttt{1}}^n : n \in \mathbb N\}, which is a non-regular language. Furthermore, the complement of LL, which is L=ΣL\overline{L} = \Sigma^* \setminus L, is non-regular. This is because regular languages are closed under complementation (Exercise (Are regular languages closed under complementation?)), so if L\overline{L} was regular, then L=L\overline{\overline{L}} = L would also have to be regular. The union of LL and L\overline{L} is Σ\Sigma^*, which is a regular language.

Exercise (Regularity of suffixes and prefixes)

Suppose LΣL \subseteq \Sigma^* is a regular language. Show that the following languages are also regular: SUFFIXES(L)={xΣ:yxL for some yΣ},PREFIXES(L)={yΣ:yxL for some xΣ}.\begin{aligned} \text{SUFFIXES}(L) & = \{x \in \Sigma^* : \text{$yx \in L$ for some $y \in \Sigma^*$}\}, \\ \text{PREFIXES}(L) & = \{y \in \Sigma^* : \text{$yx \in L$ for some $x \in \Sigma^*$}\}.\end{aligned}

Solution

Let M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) be a DFA solving LL. We define SQS \subseteq Q to be the set of states that are “reachable” starting from the initial state q0q_0. More precisely, S={sQ:yΣ such that δ(q0,y)=s}.S = \{s \in Q: \exists y \in \Sigma^* \text{ such that } \delta(q_0,y) = s\}. Define a DFA for each sSs \in S as follows: Ms=(Q,Σ,δ,s,F)M_s = (Q, \Sigma, \delta, s, F), so the only difference between MsM_s and MM is the starting/initial state. We claim that SUFFIXES(L)=sSL(Ms).\text{SUFFIXES}(L) = \bigcup_{s \in S} L(M_s). We now prove this equality by a double containment argument.

First, if xSUFFIXES(L)x \in \text{SUFFIXES}(L) then you know that there is some yΣy \in \Sigma^* such that yxLyx \in L. So M(yx)M(yx) accepts. Note that this yy, when fed into MM, ends up in some state. Let’s call that state ss. Then MsM_s accepts xx, i.e. xL(Ms)x \in L(M_s). And therefore xsSL(Ms)x \in \bigcup_{s \in S} L(M_s).

On the other hand suppose xsSL(Ms)x \in \bigcup_{s \in S} L(M_s). Then there is some state sSs \in S such that xL(Ms)x \in L(M_s). By the definition of SS, there is some string yy such that M(y)M(y) ends up in state ss. Since xx is accepted by MsM_s, we have that yxyx is accepted by MM, i.e. yxLyx \in L. By the definition of SUFFIXES(L)\text{SUFFIXES}(L), this implies xSUFFIXES(L)x \in \text{SUFFIXES}(L). This concludes the proof of the claim.

Since L(Ms)L(M_s) is regular for all sSs \in S and SS is a finite set, using Exercise (Finite vs infinite union) part 1, we can conclude that SUFFIXES(L)\text{SUFFIXES}(L) is regular.

For the second part, define the set R={rQ:xΣ such that δ(r,x)F}.R = \{r \in Q : \exists x \in \Sigma^* \text{ such that } \delta(r, x) \in F\}. Now we can define the DFA MR=(Q,Σ,δ,q0,R)M_R = (Q, \Sigma, \delta, q_0, R). Observe that this DFA solves PREFIXES(L)\text{PREFIXES}(L), which shows that PREFIXES(L)\text{PREFIXES}(L) is regular.

Theorem (Regular languages are closed under concatenation)

If L1,L2ΣL_1, L_2 \subseteq \Sigma^* are regular languages, then the language L1L2L_1L_2 is also regular.

Proof

Given regular languages L1L_1 and L2L_2, we want to show that L1L2L_1L_2 is regular. Since L1L_1 and L2L_2 are regular languages, by definition, there are DFAs M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) and M=(Q,Σ,δ,q0,F)M' = (Q', \Sigma, \delta', q'_0, F') that solves L1L_1 and L2L_2 respectively. To show L1L2L_1 L_2 is regular, we’ll construct a DFA M=(Q,Σ,δ,q0,F)M'' = (Q'', \Sigma, \delta'', q''_0, F'') that solves L1L2L_1 L_2. The definition of MM'' will make use of MM and MM'.

Before we formally define MM'', we will introduce a few key concepts and explain the intuition behind the construction.

We know that wL1L2w \in L_1L_2 if and only if there is a way to write ww as uvuv where uL1u \in L_1 and vL2v \in L_2. With this in mind, we make the following definition. Given a word w=w1w2wnΣw = w_1w_2 \ldots w_n \in \Sigma^*, a concatenation thread with respect to ww is a sequence of states r0,r1,r2,,ri,si+1,si+2,,sn,r_0, r_1, r_2, \ldots, r_i, s_{i+1}, s_{i+2}, \ldots, s_n, where r0,r1,,rir_0, r_1, \ldots, r_i is an accepting computation path of MM with respect to w1w2wiw_1w_2 \ldots w_i, and q0,si+1,si+2,,snq_0', s_{i+1}, s_{i+2}, \ldots, s_n is a computation path (not necessarily accepting) of MM' with respect to wi+1wi+2wnw_{i+1}w_{i+2}\ldots w_{n}. A concatenation thread like this corresponds to simulating MM on w1w2wiw_1w_2 \ldots w_i (at which point we require that an accepting state of MM is reached), and then simulating MM' on wi+1wi+2wnw_{i+1}w_{i+2}\ldots w_n. So a concatenation thread is really a concatenation of a thread in MM with a thread in MM'.

For each way of writing ww as uvuv where uL1u \in L_1, there is a corresponding concatenation thread for it. Note that wL1L2w \in L_1L_2 if and only if there is a concatenation thread in which snFs_n \in F'. Our goal is to construct the DFA MM'' such that it keeps track of all possible concatenation threads, and if one of the threads ends with a state in FF', then MM'' accepts.

At first, it might seem like one cannot keep track of all possible concatenation threads using only constant number of states. However, this is not the case. Let’s identify a concatenation thread with its sequence of sjs_j’s (i.e. the sequence of states from QQ' corresponding to the simulation of MM'). Consider two concatenation threads (for the sake of example, let’s take n=10n = 10): s3,s4,s5,s6,s7,s8,s9,s10s5,s6,s7,s8,s9,s10\begin{aligned} s_3, s_4, & s_5, s_6, s_7, s_8, s_9, s_{10} \\ & s'_5, s'_6, s'_7, s'_8, s'_9, s'_{10}\end{aligned} If, say, si=si=qQs_i = s_i' = q' \in Q' for some ii, then sj=sjs_j = s_j' for all j>ij > i (in particular, s10=s10)s_{10} = s'_{10}). At the end, all we care about is whether s10s_{10} or s10s'_{10} is an accepting state of MM'. So at index ii, we do not need to remember that there are two copies of qq'; it suffices to keep track of one copy. In general, at any index ii, when we look at all the possible concatenation threads, we want to keep track of the unique states that appear at that index, and not worry about duplicates. Since we do not need to keep track of duplicated states, what we need to remember is a subset of QQ' (recall that a set cannot have duplicated elements).

The construction of MM'' we present below keeps track of all the concatenation threads using a constant number of states. The strategy is to keep a single thread for the machine MM, and then separate threads for machine MM' that will correspond to the MM' portions of the different concatenation threads. So the set of states is Q=Q×(Q)={(q,S):qQ,SQ},Q'' = Q \times \wp(Q') = \{(q, S): q \in Q, S \subseteq Q'\}, where the first component keeps track of the state we are at in MM, and the second component keeps track of all the unique states of MM' that we can be at if we are following one of the possible concatenation threads.

Before we present the formal definition of MM'', we introduce one more definition. Recall that the transition function of MM' is δ:Q×ΣQ\delta' : Q' \times \Sigma \to Q'. Using δ\delta' we define a new function δ:(Q)×Σ(Q)\delta_{\wp}' : \wp(Q') \times \Sigma \to \wp(Q') as follows. For SQS \subseteq Q' and σΣ\sigma \in \Sigma, δ(S,σ)\delta_{\wp}'(S, \sigma) is defined to be the set of all possible states that we can end up at if we start in a state in SS and read the symbol σ\sigma. In other words, δ(S,σ)={δ(q,σ):qS}.\delta_{\wp}'(S, \sigma) = \{\delta'(q', \sigma) : q' \in S\}. It is appropriate to view δ\delta_{\wp}' as an extension/generalization of δ\delta'.

Here is the formal definition of MM'':

  • The set of states is Q=Q×(Q)={(q,S):qQ,SQ}Q'' = Q \times \wp(Q') = \{(q, S) : q \in Q, S \subseteq Q'\}.

    (The first coordinate keeps track of which state we are at in the first machine MM, and the second coordinate keeps track of the set of states we can be at in the second machine MM' if we follow one of the possible concatenation threads.)

  • The transition function δ\delta'' is defined such that for (q,S)Q(q,S) \in Q'' and σΣ\sigma \in \Sigma, δ((q,S),σ)={(δ(q,σ),δ(S,σ))if δ(q,σ)∉F,(δ(q,σ),δ(S,σ){q0})if δ(q,σ)F.\delta''((q,S), \sigma) = \begin{cases} (\delta(q, \sigma), \delta_{\wp}'(S,\sigma)) & \text{if } \delta(q,\sigma) \not \in F, \\ (\delta(q, \sigma), \delta_{\wp}'(S,\sigma) \cup \{q_0'\}) & \text{if } \delta(q,\sigma) \in F. \end{cases}

    (The first coordinate is updated according to the transition rule of the first machine. The second coordinate is updated according to the transition rule of the second machine. Since for the second machine, we are keeping track of all possible states we could be at, the generalized transition function δ\delta_{\wp}' gives us all possible states we can go to when reading a character σ\sigma. Note that if after applying δ\delta to the first coordinate, we get a state that is an accepting state of the first machine, a new thread in MM' must be created, which corresponds to a new concatenation thread that we need to keep track of. This is accomplished by adding q0q_0' to the second coordinate.)

  • The initial state is q0={(q0,)if q0∉F,(q0,{q0})if q0F.q''_0 = \begin{cases} (q_0, \varnothing) & \text{if } q_0 \not \in F, \\ (q_0, \{q_0'\}) & \text{if } q_0 \in F. \end{cases}

    (Initially, if q0∉Fq_0 \not \in F, then there are no concatenation threads to keep track of, so the second coordinate is the empty set. On the other hand, if q0Fq_0 \in F, then there is already a concatenation thread that we need to keep track of – the one corresponding to running the whole input word ww on the second machine – so we add q0q_0' to the second coordinate to keep track of this thread.)

  • The set of accepting states is F={(q,S):qQ,SQ,SF}F'' = \{(q, S) : q \in Q, S \subseteq Q', S \cap F' \neq \varnothing\}.

    (In other words, MM'' accepts if and only if there is a state in the second coordinate that is an accepting state of the second machine MM'. So MM'' accepts if and only if one of the possible concatenation threads ends in an accepting state of MM'.)

This completes the definition of MM''. To see that MM'' indeed solves the language L1L2L_1 L_2, i.e. L(M)=L1L2L(M'') = L_1 L_2, note that by construction, MM'' with input ww, does indeed keep track of all the possible concatenation threads. And it accepts ww if and only if one of those concatenation threads ends in an accepting state of MM'. The result follows since wL1L2w \in L_1 L_2 if and only if there is a concatenation thread with respect to ww that ends in an accepting state of MM'.

We defined a generalized transition function in the proof of the above theorem. This is a useful definition and we will be using it multiple times in what comes next. Therefore we repeat the definition below.

Definition (Generalized transition function)

Let M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) be a DFA. We define the generalized transition function δ:(Q)×Σ(Q)\delta_\wp: \wp(Q) \times \Sigma \to \wp(Q) as follows. For SQS \subseteq Q and σΣ\sigma \in \Sigma, δ(S,σ)={δ(q,σ):qS}.\delta_{\wp}(S, \sigma) = \{\delta(q, \sigma) : q \in S\}.

Exercise (Regular languages are closed under concatenation - another construction)

In the proof of Theorem (Regular languages are closed under concatenation), we defined the set of states for the DFA solving L1L2L_1L_2 as Q=Q×(Q)Q'' = Q \times \wp(Q'). Construct a DFA for L1L2L_1L_2 in which QQ'' is equal to (QQ)\wp(Q \cup Q'), i.e., specify how δ\delta'', q0q_0'' and FF'' should be defined with respect to Q=(QQ)Q'' = \wp(Q \cup Q'). Proof of correctness is not required.

Solution

As before, we have DFAs M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) and M=(Q,Σ,δ,q0,F)M' = (Q', \Sigma, \delta', q'_0, F') solving L1L_1 and L2L_2 respectively. The construction for L1L2L_1L_2 is as follows. We are given that the set of states is Q=(QQ).Q'' = \wp(Q \cup Q'). To define the transition function, let SQ=(QQ)S'' \in Q'' = \wp(Q \cup Q') and σΣ\sigma \in \Sigma. Write SS'' as SSS \cup S', where SQS \subseteq Q and SQS' \subseteq Q'. Then, δ(SS,σ)={δ(S,σ)δ(S,σ){q0}if δ(S,σ)F,δ(S,σ)δ(S,σ)otherwise.\delta''(S \cup S',\sigma) = \begin{cases} \delta_{\wp}(S, \sigma) \cup \delta'_{\wp}(S', \sigma) \cup \{q_0'\} & \text{if } \delta_{\wp}(S, \sigma) \cap F \neq \varnothing, \\ \delta_{\wp}(S, \sigma) \cup \delta'_{\wp}(S', \sigma) & \text{otherwise.} \end{cases} The initial state is q0={{q0,q0}if q0F,{q0}if q0∉F.q''_0 = \begin{cases} \{q_0, q_0'\} & \text{if } q_0 \in F, \\ \{q_0\} & \text{if } q_0 \not \in F. \end{cases} And the set of accepting states is F={SQ:SF}.F'' = \{S \in Q'' : S \cap F' \neq \varnothing\}. Note that this construction is not really much different from the construction given in the proof of Theorem (Regular languages are closed under concatenation) because the way the initial state and the transition function is defined, a state SS'' will always have a single element from QQ. So one can view SS'' as an element of Q×(Q)Q \times \wp(Q').

Exercise (Regular languages are closed under star - wrong proof)

Critique the following argument that claims to establish that regular languages are closed under the star operation, that is, if LL is a regular language, then so is LL^*.

Let LL be a regular language. We know that by definition L=nNLnL^* = \bigcup_{n \in \mathbb N} L^n, where Ln={u1u2un:uiL for all i}.L^n = \{u_1 u_2 \ldots u_n : u_i \in L \text{ for all $i$} \}. We know that for all nn, LnL^n must be regular using Theorem (Regular languages are closed under concatenation). And since LnL^n is regular for all nn, we know LL^* must be regular using Theorem (Regular languages are closed under union).

Solution

It is true that using induction, we can show that LnL^n is regular for all nn. However, from there, we cannot conclude that L=nNLnL^* = \bigcup_{n \in \mathbb N} L^n is regular. Even though regular languages are closed under finite unions, they are not closed under infinite unions. See Exercise (Finite vs infinite union).

Exercise (Regular languages are closed under star - correct proof)

Show that regular languages are closed under the star operation as follows: First show that if LL is regular, then so is L+L^+, which is defined as the union L+=nN+Ln.L^+ = \bigcup_{n \in \mathbb N^+} L^n. For this part, given a DFA for LL, show how to construct a DFA for L+L^+. A proof of correctness is not required. In order to conclude that LL^* is regular, observe that L=L+{ϵ}L^* = L^+ \cup \{\epsilon\}, and use the fact that regular languages are closed under union.

Solution

We construct a DFA M=(Q,Σ,δ,q0,F)M' = (Q', \Sigma, \delta', q_0', F') solving L+L^+ using a DFA M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) that solves LL. The construction is as follows. Q=(Q).Q' = \wp(Q). So the elements of QQ' are subsets of QQ. To define the transition function, for any SQS \subseteq Q and any σΣ\sigma \in \Sigma, let δ(S,σ)={δ(S,σ){q0}if δ(S,σ)F,δ(S,σ)otherwise.\delta'(S,\sigma) = \begin{cases} \delta_{\wp}(S, \sigma) \cup \{q_0\} & \text{if } \delta_{\wp}(S,\sigma) \cap F \neq \varnothing, \\ \delta_{\wp}(S, \sigma) & \text{otherwise.} \end{cases} The initial state is q0={q0}q_0' = \{q_0\}. And the set of accepting states is

4
Bonus: Non-Deterministic FA

To be added.

5
Check Your Understanding
Problem
  1. What are the 55 components of a DFA?

  2. Let MM be a DFA. What does L(M)L(M) denote?

  3. Draw a DFA solving \varnothing. Draw a DFA solving Σ\Sigma^*.

  4. True or false: Given a language LL, there is at most one DFA that solves it.

  5. Fix some alphabet Σ\Sigma. How many DFAs are there with exactly one state?

  6. True or false: For any DFA, all the states are “reachable”. That is, if D=(Q,Σ,δ,q0,F)D = (Q, \Sigma, \delta, q_0, F) is a DFA and qQq \in Q is one of its states, then there is a string wΣw \in \Sigma^* such that δ(q0,w)=q\delta^*(q_0,w) = q.

  7. What is the set of all possible inputs for a DFA D=(Q,Σ,δ,q0,F)D = (Q, \Sigma, \delta, q_0, F)?

  8. Consider the set of all DFAs with kk states over the alphabet Σ={a}\Sigma =\{{\texttt{a}}\} such that all states are reachable from q0q_0. What is the cardinality of this set?

  9. What is the definition of a regular language?

  10. Describe the general strategy (the high level steps) for showing that a language is not regular.

  11. Give 33 examples of non-regular languages.

  12. Let L{a}L \subseteq \{{\texttt{a}}\}^* be a language consisting of all strings of a{\texttt{a}}’s of odd length except length 251251. Is LL regular?

  13. Let LL be the set of all strings in {0,1}\{{\texttt{0}},{\texttt{1}}\}^* that contain at least 251251 0{\texttt{0}}’s and at most 251251 1{\texttt{1}}’s. Is LL regular?

  14. Suppose you are given a regular language LL and you are asked to prove that any DFA solving LL must contain at least kk states (for some kk value given to you). What is a general proof strategy for establishing such a claim?

  15. Let M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F) be a DFA solving a language LL and let M=(Q,Σ,δ,q0,F)M' = (Q', \Sigma, \delta', q'_0, F') be a DFA solving a language LL'. Describe the 55 components of a DFA solving LLL \cup L'.

  16. True or false: Let L1L2L_1 \oplus L_2 denote the set of all words in either L1L_1 or L2L_2, but not both. If L1L_1 and L2L_2 are regular, then so is L1L2L_1 \oplus L_2.

  17. True or false: For languages LL and LL', if LLL \subseteq L' and LL is non-regular, then LL' is non-regular.

  18. True or false: If LΣL \subseteq \Sigma^* is non-regular, then L=ΣL\overline{L} = \Sigma^* \setminus L is non-regular.

  19. True or false: If L1,L2ΣL_1, L_2 \subseteq \Sigma^* are non-regular languages, then so is L1L2L_1 \cup L_2.

  20. True or false: Let LL be a non-regular language. There exists KLK \subset L, KLK \neq L, such that KK is also non-regular.

  21. By definition a DFA has finitely many states. What is the motivation/reason for this restriction?

  22. Consider the following decision problem: Given as input a DFA, output True if and only if there exists some string sΣs \in \Sigma^* that the DFA accepts. Write the language corresponding to this decision problem using mathematical notation, and in particular, using set builder notation.

6
High-Order Bits
Important Note

Here are the important things to keep in mind from this chapter.

  1. Given a DFA, describe in English the language that it solves.

  2. Given the description of a regular language, come up with a DFA that solves it.

  3. Given a non-regular language, prove that it is indeed non-regular. Make sure you are not just memorizing a template for these types of proofs, but that you understand all the details of the strategy being employed. Apply the Feynman technique and see if you can teach this kind of proof to someone else.

  4. In proofs establishing a closure property of regular languages, you start with one or more DFAs, and you construct a new one from them. In order to build the new DFA successfully, you have to be comfortable with the 5-tuple notation of a DFA. The constructions can get notation-heavy (with power sets and Cartesian products), but getting comfortable with mathematical notation is one of our learning objectives.

  5. With respect to closure properties of regular languages, prioritize having a very solid understanding of closure under the union operation before you move to the more complicated constructions for the concatenation operation and star operation.

The letter \(L\) can be overloaded: we often use it to denote a language \(L \subseteq \Sigma^*\), but in this notation, it represents a function that maps a DFA to a language. Given the context, this overloading should not create any ambiguity.
The pigeonhole principle states that if \(n\) items are put inside \(m\) containers, and \(n > m\), then there must be at least one container with more than one item. The name pigeonhole principle comes from thinking of the items as pigeons, and the containers as holes. The pigeonhole principle is often abbreviated as PHP.
At this point, it would be reasonable to end the proof and say that the construction of \(M''\) is correct based on our discussion earlier that motivates the construction. It is also reasonable to spell out the proof of correctness, as we do here. In situations like these, in an introductory course like this one, we recommend you err on the side of caution and spell out the correctness proof.
Recall that for any set \(Q\), the set of all subsets of \(Q\) is called the power set of \(Q\), and is denoted by \(\wp(Q)\).
Note (State diagram of a DFA)

It is very common to represent a DFA with a state diagram. Below is an example of how we draw a state diagram of a DFA:

In this example, \(\Sigma = \{{\texttt{0}},{\texttt{1}}\}\), \(Q = \{q_0,q_1,q_2,q_3\}\), \(F = \{q_1,q_2\}\). The labeled arrows between the states encode the transition function \(\delta\), which can also be represented with a table as shown below (row \(q_i \in Q\) and column \(b \in \Sigma\) contains \(\delta(q_i, b)\)).

The start state is labeled with \(q_0\), but if another label is used, we can identify the start state in the state diagram by identifying the state with an arrow that does not originate from another state and goes into that state.

See in chapter
Exercise (Draw DFAs)

For each language below (over the alphabet \(\Sigma = \{{\texttt{0}},{\texttt{1}}\}\)), draw a DFA solving it.

  1. \(\{{\texttt{101}}, {\texttt{110}}\}\)

  2. \(\{{\texttt{0}},{\texttt{1}}\}^* \setminus \{{\texttt{101}}, {\texttt{110}}\}\)

  3. \(\{x \in \{{\texttt{0}},{\texttt{1}}\}^* : \text{$x$ starts and ends with the same bit}\}\)

  4. \(\{{\texttt{110}}\}^* = \{\epsilon, {\texttt{110}}, {\texttt{110110}}, {\texttt{110110110}}, \ldots\}\)

  5. \(\{x \in \{{\texttt{0}},{\texttt{1}}\}^*: \text{$x$ contains ${\texttt{110}}$ as a substring}\}\)

See in chapter
Exercise (Draw DFAs)

For each language below (over the alphabet \(\Sigma = \{{\texttt{0}},{\texttt{1}}\}\)), draw a DFA solving it.

  1. \(\{{\texttt{101}}, {\texttt{110}}\}\)

  2. \(\{{\texttt{0}},{\texttt{1}}\}^* \setminus \{{\texttt{101}}, {\texttt{110}}\}\)

  3. \(\{x \in \{{\texttt{0}},{\texttt{1}}\}^* : \text{$x$ starts and ends with the same bit}\}\)

  4. \(\{{\texttt{110}}\}^* = \{\epsilon, {\texttt{110}}, {\texttt{110110}}, {\texttt{110110110}}, \ldots\}\)

  5. \(\{x \in \{{\texttt{0}},{\texttt{1}}\}^*: \text{$x$ contains ${\texttt{110}}$ as a substring}\}\)

See in chapter
Exercise (Draw DFAs)

For each language below (over the alphabet \(\Sigma = \{{\texttt{0}},{\texttt{1}}\}\)), draw a DFA solving it.

  1. \(\{{\texttt{101}}, {\texttt{110}}\}\)

  2. \(\{{\texttt{0}},{\texttt{1}}\}^* \setminus \{{\texttt{101}}, {\texttt{110}}\}\)

  3. \(\{x \in \{{\texttt{0}},{\texttt{1}}\}^* : \text{$x$ starts and ends with the same bit}\}\)

  4. \(\{{\texttt{110}}\}^* = \{\epsilon, {\texttt{110}}, {\texttt{110110}}, {\texttt{110110110}}, \ldots\}\)

  5. \(\{x \in \{{\texttt{0}},{\texttt{1}}\}^*: \text{$x$ contains ${\texttt{110}}$ as a substring}\}\)

See in chapter
Exercise (Would any set of pigeons work?)

In the proof of the above theorem, we defined the set \(P = \{{\texttt{0}}^n : n \in \{0,1,\ldots,k\}\}\) and then applied the pigeonhole principle. Explain why picking the following sets would not have worked in that proof.

  1. \(P = \{{\texttt{1}}^n : n \in \{1,2,\ldots,k+1\}\}\)

  2. \(P = \{{\texttt{1}}, {\texttt{11}}, {\texttt{000}}, {\texttt{0000}}, {\texttt{00000}}, \ldots, {\texttt{0}}^{k+1}\}\)

See in chapter
Exercise (A fooling pigeon set)

In Exercise (Would any set of pigeons work?) we saw that the “pigeon set” that we use to apply the pigeonhole principle must be chosen carefully. We’ll call a pigeon set a fooling pigeon set if it is a pigeon set that “works”. That is, given a DFA with \(k\) states that is assumed to solve a non-regular \(L\), a fooling pigeon set of size \(k+1\) allows us to carry out the contradiction proof, and conclude that \(L\) is non-regular. Identify the property that a fooling pigeon set should have.

See in chapter
Theorem (\({\texttt{0}}^n {\texttt{1}}^n\) is not regular)

Let \(\Sigma = \{{\texttt{0}},{\texttt{1}}\}\). The language \(L = \{{\texttt{0}}^n {\texttt{1}}^n: n \in \mathbb N\}\) is not regular.

See in chapter
Theorem (Regular languages are closed under union)

Let \(\Sigma\) be some finite alphabet. If \(L_1 \subseteq \Sigma^*\) and \(L_2 \subseteq \Sigma^*\) are regular languages, then the language \(L_1 \cup L_2\) is also regular.

See in chapter
Theorem (Regular languages are closed under union)

Let \(\Sigma\) be some finite alphabet. If \(L_1 \subseteq \Sigma^*\) and \(L_2 \subseteq \Sigma^*\) are regular languages, then the language \(L_1 \cup L_2\) is also regular.

See in chapter
Exercise (Are regular languages closed under complementation?)

Is it true that if \(L\) is regular, then its complement \(\Sigma^* \setminus L\) is also regular? In other words, are regular languages closed under the complementation operation?

See in chapter
Exercise (Finite vs infinite union)
  1. Suppose \(L_1, \ldots, L_k\) are all regular languages. Is it true that their union \(\bigcup_{i=0}^k L_i\) must be a regular language?

  2. Suppose \(L_0, L_1, L_2, \ldots\) is an infinite sequence of regular languages. Is it true that their union \(\bigcup_{i\geq 0} L_i\) must be a regular language?

See in chapter
Theorem (Regular languages are closed under concatenation)

If \(L_1, L_2 \subseteq \Sigma^*\) are regular languages, then the language \(L_1L_2\) is also regular.

See in chapter
Theorem (Regular languages are closed under concatenation)

If \(L_1, L_2 \subseteq \Sigma^*\) are regular languages, then the language \(L_1L_2\) is also regular.

See in chapter
Theorem (Regular languages are closed under concatenation)

If \(L_1, L_2 \subseteq \Sigma^*\) are regular languages, then the language \(L_1L_2\) is also regular.

See in chapter
Theorem (Regular languages are closed under union)

Let \(\Sigma\) be some finite alphabet. If \(L_1 \subseteq \Sigma^*\) and \(L_2 \subseteq \Sigma^*\) are regular languages, then the language \(L_1 \cup L_2\) is also regular.

See in chapter
Exercise (Finite vs infinite union)
  1. Suppose \(L_1, \ldots, L_k\) are all regular languages. Is it true that their union \(\bigcup_{i=0}^k L_i\) must be a regular language?

  2. Suppose \(L_0, L_1, L_2, \ldots\) is an infinite sequence of regular languages. Is it true that their union \(\bigcup_{i\geq 0} L_i\) must be a regular language?

See in chapter